Discovering Common Pathophysiological Processes between COVID-19 and Cystic Fibrosis by Differential Gene Expression Pattern Analysis

Coronaviruses are a family of viruses that infect mammals and birds. Coronaviruses cause infections of the respiratory system in humans, which can be minor or fatal. A comparative transcriptomic analysis has been performed to establish essential profiles of the gene expression of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) linked to cystic fibrosis (CF). Transcriptomic studies have been carried out in relation to SARS-CoV-2 since a number of people have been diagnosed with CF. The recognition of differentially expressed genes demonstrated 8 concordant genes shared between the SARS-CoV-2 and CF. Extensive gene ontology analysis and the discovery of pathway enrichment demonstrated SARS-CoV-2 response to CF. The gene ontological terms and pathway enrichment mechanisms derived from this research may affect the production of successful drugs, especially for the people with the following disorder. Identification of TF-miRNA association network reveals the interconnection between TF genes and miRNAs, which may be effective to reveal the other influenced disease that occurs for SARS-CoV-2 to CF. The enrichment of pathways reveals SARS-CoV-2-associated CF mostly engaged with the type of innate immune system, Toll-like receptor signaling pathway, pantothenate and CoA biosynthesis, allograft rejection, graft-versus-host disease, intestinal immune network for IgA production, mineral absorption, autoimmune thyroid disease, legionellosis, viral myocarditis, inflammatory bowel disease (IBD), etc. The drug compound identification demonstrates that the drug targets of IMIQUIMOD and raloxifene are the most significant with the significant hub DEGs.


Introduction
Coronavirus disease  is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. Coronaviruses are a group of viruses that cause sickness ranging from the common cold to severe disorders such as Middle East Respiratory Syndrome (MERS) and SARS. In Wuhan, China, this virus was first found in people exposed to a seafood or wet market [2]. Chinese public health, healthcare communities, and scientists have been rapidly responding, helping us to detect clinical disease and understand the infection epidemiology. First news suggested that transmission between people has been restricted or not existing, but we know that this is happening, even though it remains unclear to what degree [3,4]. As of March 25, 2021, around 125,608,151 COVID-19 cases and 2,759,429 fatalities have been documented [5]. The virus is initially transmitted between humans through close contact, most commonly by minute droplets generated by coughing, sneezing, and talking [6]. COVID-19 has now infected people of all ages in 215 countries around the world [7,8]. The contaminated droplets will disperse and deposit for 1-2 meters across the surfaces. The virus may survive for several days on surfaces, but standard disinfectants such as sodium hypochlorite, hydrogen peroxide, and others destroy it in less than one minute [9]. COVID-19's clinical features range from asymptomatic to severe respiratory distress and multiorgan dysfunction. Conjoiners are also identified. They are also inseparable from other breathing pathogens [10]. Most of the peoples who died with COVID-19 have diseases that still exist, including high blood pressure, diabetes mellitus, and cardiovascular disease [11]. Hypertension (66 percent of fatalities), type 2 diabetes (29.8 percent of deaths), coronary heart disease (27.6 percent of deaths), atrial fibrillation (23.1 percent of deaths), and chronic renal failure are the most frequent comorbidities (20.2 percent of deaths) [12]. The most serious respiratory diseases, according to the Centers for Disease Control and Prevention (CDC), include moderate or severe asthma, preexisting COPD, pulmonary fibrosis, and cystic fibrosis. McClenaghan et al. reported that 181 patients with CF (32 posttransplant) from 19 countries were infected with SARS-CoV-2 on June 13, 2020. SARS-CoV-2 infections had a similar spectrum of symptoms as the general population, with 11 patients hospitalized to critical care and 7 fatalities [13]. On the other hand, in February 2021, Manti et al. submitted a report on the patients with CF diagnosed with SARS-CoV-2. This study was based on 58 patients with CF where 12 patients diagnosed with SARS-CoV-2 [14]. Considering all these aspects, this study was aimed at exploring the common core mechanisms including key genes, common pathways, drug target, and disease pathways.
In this study, we used a comprehensive and quantitative network-based approach to investigate the gene expression influence of SARS-CoV-2 and CF, as well as how these effects may be representative of how they promote the occurrence and development of other disorders across pathways and pathway genes that are also affected in these disorders, thus, equating the impact of SARS-CoV-2 exposure on gene expression with the changed pattern of gene expression observed in CF. The first step was to examine various gene profiles and then filter these genes with networks of gene disorder connections, signaling, and ontological pathways and networks of interactions between protein and protein and TF-gene interactions. All the processes are shortly demonstrated in Figure 1 Figure 2(b)).

Dataset Consideration.
The NCBI gene expression microarray datasets were used to determine a gene expression dysregulation common to SARS-CoV-2 and CF. For this finding, two differentially expressed datasets with accession numbers GSE156544 and GSE107846 were considered [15,16]. The GSE156544 dataset considered as epithelial response to IFN-γ promotes SARS-CoV-2 infection. The dataset stands on the GPL21272 microarray platform (Agilent-048908 8x60K whole genome incl V1-V2 linc BioVac-Safe final 048908) and consists of a total of 8 samples: 6 normal samples and 2 infected samples. Secondhand smoking affects arachidonic acid metabolism in newborns and children with cystic fibrosis, according to GSE107846. GSE107846 also stands on the GPL13369 microarray platform (Illumina Human Whole-Genome DASL HT) and consists of a total of 40 samples: 12 healthy samples and 28 cystic fibrosis-infected samples.

Dataset Filtration and Retrieval of DEGs to Identify
Common DEGs between SARS-CoV-2 and CF. Microarray datasets GSE156544 and GSE147507 are both considered for epithelial response to IFN-γ which promotes SARS-CoV-2 infection, and secondhand smoke alters arachidonic acid metabolism in infants and children with cystic fibrosis, respectively, for this study. For both datasets, GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) assists to identify the expected DEGs. GEO2R is a web-based microarray data analysis tool. Comparing contaminated samples against managed samples is carried out in a comparative manner, and comparison is carried out using the limma and GEO query [17] packages from Bioconductor [18] project in the R programming language framework. To determine cutoff criteria for significant DEGs, 0.75 log2 fold change and P value < 0.05 have been considered for both datasets. Afterward, a comparison method between the datasets was made to extract the shared genes between the diseases.

2
BioMed Research International 2.4. Gene Ontology Enrichment and Gene-Associated Pathway Analysis. Gene set enrichment analysis is typically a quantitative and mathematical approach that determines whether a set of determined genes show statistical importance in various biological conditions [19]. Gene ontology (GO) is a bioinformatics resource that provides information about the role of a gene product through ontology representing biological knowledge [20]. Gene ontology (GO) comprises three hierarchies describing the gene function based on biological process (BP), cellular component (CC), and molecular function [21]. For this study, GO terms have been extracted from the Enrichr (https:// maayanlab.cloud/Enrichr/) web-based platform which is a publicly open repository and contains in total 180,184 annotated gene sets from 102 gene set libraries [22]. Kyoto Encyclopedia of Genes and Genome (KEGG) [23], Reactome [24], and WikiPathways [25] are used for cellular descriptive research pathways. This analysis was also extracted from the Enrichr database.

Protein Interaction Network Construction and Analysis.
Protein interaction network is the graphical representation of the connections between proteins where nodes and edges represent proteins and connections, respectively [26]. The backbone of the transduction pathways and networks in a number of physiological processes is protein-protein interactions (PPIs). As the "unwieldy" PPIs are now a possible new class of medicinal drugs, their essential functions in the relaying of cell growth signals in both normal and cancer cells are very significant [27]. Overlapped DEGs were applied to construct protein interaction network and extract protein-protein interaction network (PPIN) from the Net-workAnalyst tool (https://www.networkanalyst.ca/) using the STRING database [28,29]. NetworkAnalyst is a robust  Figure 1: Image workflow of this study's methods. A transcriptomic comparative analysis was performed between SARS-CoV-2 and CF. Two datasets have been collected for SARS-CoV-2 and CF, differentially expressed genes (DEGs) were identified using GEO2R, and both datasets were filtered to normalize datasets and identify shared DEGs. A transcriptomic analysis was performed using shared datasets.

BioMed Research International
web-based platform that enables bench researchers to conduct popular and nuanced meta-analysis of gene expression data using a user-friendly web interface [30]. The collected PPIN was displayed, analyzed, and redesigned using the Cytoscape software tool, which is the most widely used publicly open software tool for visualizing biological networks made up of protein, gene, and other types of connections [31].
2.6. Cytoscape Plugin Algorithm Analysis. Hub genes or key genes are defined as genes with high correlation in the large scale of PPI network [32]. The cytoHubba plugin (http:// apps.cytoscape.org/apps/cytohubba) tool was used to detect highly connected protein node from the PPI network. Degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Maximal Clique Centrality (MCC), and six centralities (Bottleneck, EcCentricity, Closeness, Radiality, Betweenness, and Stress) are among the 11 topological analysis methods available in cytoHubba. Among them, MCC is a novel suggested technique that performs better in terms of predicting important proteins from the yeast PPI network in terms of precision [33]. Maximal Clique Centrality (MCC) algorithm was applied to determine the hub genes for this study [34]. The PEWCC method assesses the trustworthiness of interaction data before predicting protein complexes using the weighted clustering coefficient idea that is used to determine the complex network or cluster network [35].

miRNA-TF Coregulatory Network Development and
Analysis. The miRNA-transcription factor (TF) coregulatory network was constructed using RegNetwork repository (http://www.regnetworkweb.org/), which provided new insights on the role of the proposed miRNA-TF coregulation in the affected cell of SARS-CoV-2 diagnosed with CF and its cross talk with the surrounding microenvironment [36].
RegNetwork is a repository of human and mouse gene regulatory networks that captures and integrates known regulatory interactions between transcription factors (TFs), microRNAs (miRNAs), and target genes in 25 selected databases. RegNetwork comprises a detailed collection of transcriptional and posttranscriptional regulatory interactions that are experimentally observed or predicted, and the database structure is flexible for possible expansions into gene regulatory networks for other species [37].

Drug Candidate Identification.
A drug candidate has been extracted from the drug signature database (DSigDB) for the significant genes [38]. DSigDB is a publicly opensource online repository which holds 22527 gene sets and consists of 17389 unique compounds covering 19531 genes at present. DSigDB is a database of drug-related and small molecular gene samples focused on improvements in quantitative gene expression and/or drug-induced data. In the following ways, DSigDB varies from current tools: (1) DSigDB genes have been extracted from a range of databases and journals and collected from quantitative drug/

Gene Ontology Analysis and Gene-Associated Pathway
Enrichment. GO enrichment was analyzed using identified overlapped DEGs between SARS-CoV-2 and CF. The results of GO enrichment provide biological process highly associated with negative regulation of endocytosis, cellular response to molecule of bacterial origin, cellular response to lipoteichoic acid, positive regulation of T-helper 2 cell differentiation, cellular response to bacterial lipopeptide, response to lipoteichoic acid, negative regulation of synapse organization, etc. Toll-like receptor binding, peptidoglycan binding, peptidase inhibitor activity, endopeptidase regulator activity, phosphatidylinositol-4,5-bisphosphate 3-kinase activity, etc. are associated with molecular function. Cellular component is mostly engaged with platelet alpha granule lumen, platelet alpha granule, Golgi lumen, membrane raft, and recycling endosome (Figure 4(a), Table 1). On the other hand pathway enrichment analysis of KEGG, Reactome, and WikiPathways has been extracted through Enrichr opensource database [41]. Extracted results showed that rheuma-toid arthritis, Toll-like receptor signaling pathway, pantothenate and CoA biosynthesis, allograft rejection, graft-versushost disease, intestinal immune network for IgA production, mineral absorption, autoimmune thyroid disease, legionellosis, viral myocarditis, inflammatory bowel disease (IBD), innate immune system, gamma-carboxylation of protein precursors, gamma-carboxylation, transport, aminoterminal cleavage of proteins, MyD88 deficiency (TLR2/4), human complement system, Toll-like receptor signaling pathway, regulation of toll-like receptor signaling pathway, ApoE and miR-146 in inflammation and atherosclerosis, macrophage markers, and pathways of KEGG, Reactome, and WikiPpathways are highly connected with the common DEGs (Figure 4(b), Table 2).

Highly Connected Protein in Protein Interaction
Network. The NetworkAnalyst database was used to construct the PPI network through the overlapped DEGs including ROPN1L, VNN2, TLR2, MCTP1, PROS1, LBH, TRPM6, and CD86. The extracted PPI network built with the confidence score 0.90 and network represents connectivity of TLR2, PROS1, CD86, and TRPM6 to other nodes ( Figure 5). The network builds up with 80 nodes and 76 edges where TLR2, PROS1, CD86, and TRPM6 are the seed nodes; from them, TLR2 is the highly connected seed node which consists of 57 nodes. The PPI network is the base of

MCODE and cytoHubba Plugin
Analysis. The cyto-Hubba plugin algorithms were used to predict the key nodes with the MCC method of eleven methods provided by cyto-Hubba. From the result of cytoHubba, four nodes were taken as hub nodes including TLR2, PROS1, CD86, and TRPM6 ( Figure 6). Toll-like receptor 2 is known as TLR2 [42], which consists of the highest score on the Maximal Clique Centrality (MCC) algorithm. The PPI network was visualized to extract the protein complex network using MCODE algorithm. There was no protein complex network showed by the MCODE algorithm.
3.5. Hub Node-Related Data. TLR2 is one of the most difficult immunological receptors to understand, yet it plays an important function in the immune system. TLR2 is a membrane protein and a receptor that is expressed on the surface of certain cells, detects external chemicals, and sends appropriate signals to immune system cells [43]. TLR2 identifies a large number of bacterial, fungal, fungus, and certain endogenous compounds as a membrane surface receptor. A research found that the TLR2/6 agonist INNA-051's prophylactic intranasal administration in a ferret infection model SARS-CoV-2 effectively lowers viral RNA levels in the nose and neck. The findings from their research help the clinical development of prophylactic TLR2/6-based therapy for innate URT immune activating, reducing transmission of SARS-CoV-2 and providing defense against COVID-19 [44]. Protein S (also called S-Protein or Spike Protein) is a vitamin K-dependent plasma glycoprotein produced in the liver. Protein S can be found in two distinct forms: a free form and a complex form bound by protein C4b (C4BP). Protein S codes for the PROS1 gene in humans [45,46]. The coronavirus spike protein S is involved in several biological processes, including antibody-mediated virus neutralization, cell attachment, and cell fusion. An S determinant found in mouse coronaviruses was used to deduce the amino acid  [47]. Cluster Differentiation 86 is a protein that is constitutively expressed on dendritic cells, Langerhans cells, macrophages, B cells, and on other antigenic cells. Cluster Differentiation 86 is also referred to as CD86 and B7-2 [48]. A study in 2020 has demonstrated a dis-tinct signaling event induced by CD80 and CD86 molecules in B cell lymphoma [49]. Another study found that inserting host-derived costimulatory molecules CD80 and CD86 into human immunodeficiency virus type 1 changes the virus's life cycle [50].   Figure 7: The TF-miRNA network build up with TF genes and miRNA. The network conducts 25 TF genes and 80 miRNA. Blue-colored rectangle nodes indicate miRNA, diamond-shaped green-colored nodes indicate TF genes, and red-colored octagon nodes indicate seed nodes. TF genes and miRNA interconnected with 6 seed nodes.  (Figure 7).

IMIQUIMOD and Raloxifene Drug Candidate Highly
Associated with the Hub Genes. The predictive drug signature has been extracted from the DSigDB database using the overlapped DEGs of SARS-CoV-2 and CF including ROPN1L, VNN2, TLR2, MCTP1, PROS1, LBH, TRPM6, and CD86. The result has been taken based on adjusted P value and overlapped DEGs. IMIQUIMOD and raloxifene are mostly significant based on their adjusted P value and overlapped genes that are shown in Table 3.

Discussion
This research explored common connections between SARS-CoV-2 and CF and the role of SARS-CoV-2 on CF. As of 8 March 2021, according to the European Cystic Fibrosis Society, a total of 1126 cases reported CF diagnosed with SARS-CoV-2. This means there are significant connections between SARS-CoV-2 and CF. Considering all these perspectives, this study desired to demonstrate the association between SARS-CoV-2 and CF including GO analysis, pathway enrichment, PPI network construction, hub gene finding, TF-miRNA coregulatory network, and drug candidate identification.
Two microarray datasets for SARS-CoV-2 and CF have been considered for this study. The selected dataset has been filtered to find common DEGs. The GO analysis and pathway enrichment were applied for common DEGs. The GO analysis revealed that the terms of negative regulation of endocytosis, cellular response to molecule of bacterial origin, cellular response to lipoteichoic acid, cellular response to bacterial lipopeptide, Toll-like receptor binding, peptidoglycan binding, peptidase inhibitor activity, endopeptidase regulator activity, phosphatidylinositol-4,5-bisphosphate 3kinase activity, platelet alpha granule lumen, platelet alpha granule, Golgi lumen, membrane raft, and recycling endosome are significantly associated with the overlapped DEGs. Pathway enrichment has been collected from three databases (KEGG, Reactome, and WikiPathways) through the Enrichr repository. The results of pathway enrichment showed that rheumatoid arthritis, Toll-like receptor signaling pathway, pantothenate, CoA biosynthesis, allograft rejection, graftversus-host disease, intestinal immune network for IgA production, mineral absorption, autoimmune thyroid disease, legionellosis, viral myocarditis, inflammatory bowel disease (IBD), innate immune system, gamma-carboxylation of protein precursors, gamma-carboxylation, etc. are associated with the common DEGs.
The protein interaction network is also constructed with the gene association of SARS-CoV-2 and CF for further analysis. The PPI network conducts 80 nodes and 76 edges, from where hub nodes identified include TLR2, PROS1, CD86, and TRPM6. The regulatory networks could play a significant role as a potential biomarker to the various complex types of disease. Common DEGs have been performed to clarify the TF-miRNA coregulatory network which is the interaction of TF genes and miRNA. With 25 TF genes and 80 miRNA candidates, a regulatory network was built.
Finally, using hub nodes applied for the drug signature. From the identified drug signature, muramyl dipeptide CTD 00005307 and raloxifene CTD 00007367 are highly significant. Imiquimod (IMQ) is a heterocyclic amine that does not have nucleoside and belongs to the 1H-imidazole- [4, 5c] family [52][53][54][55]. In general, imiquimod acts indirectly as an immune reaction modification inducing immune reactions and the section of several cytokines which stimulate T cells in turn [56]. The innate immune system works by recognizing toxins in the body and activating a variety of cell types to destroy them. Imiquimod works in the innate immune system by binding it to cell surface receptors including peak receptors (TLRs). In the early stages of the disease where activation of innate immunity by a TLR-7 agonist is crucial, imiquimod's function as an active agent of the disease indicates its ability to treat viral infections such as SARS-CoV-2 [57]. Raloxifene is a modulatory selection of estrogen receptors for the treatment of postmenopausal osteoporosis and cancer that was approved for treatment and prevention by the FDA in 1997. Recently, raloxifene also demonstrated effectiveness in treating Ebola, influenza A, and hepatitis C virus viral infections and exhibited possible drug repurposing to combat SARS-CoV-2 infections [58]. The outcomes of this study have been validated from the gold benchmark databases OMIM and dbGaP. In the future, this research may help to expose the other disease complexity, pathways, and transcription factor for the SARS-CoV-2associated CF.

Conclusion
In this study, the identified biologic regions and regulatory components were briefly addressed, which may speed up clinical activity against SARS-CoV-2 associated with CF. The strength of our study should be taken into consideration, as it is the greatest transcriptomic study on SARS-CoV-2-associated CF. Transcriptomic analysis of this research has produced the common ontological entity, pathways, disease connectivity, and transferrin genes. In CFassociated SARS-CoV-2, the corresponding genes between SARS-CoV-2 and CF that generate more molecular findings and demonstrate the interaction of DEGs have been identified. In the future, this study may help to develop drug targets and treatment.

Data Availability
The datasets used and/or analyzed during the current study are available in NCBI. All of the resources are free and publicly accessible.